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We show how to combine the light-cone and matrix product algorithms to simulate quantum 
systems far from equilibrium for long times. For the case of the XXZ spin chain at A — 0.5, we 
simulate to a time of ~ 22.5. While part of the long simulation time is due to the use of the light-cone 
method, we also describe a modification of the iTEBD algorithm with improved numerical stability, 
and we describe how to incorporate symmetry into this algorithm. While statistical sampling error 
means that we are not yet able to make a definite statement, the behavior of the simulation at long 
times indicates the appearance of either "revivals" in the order parameter as predicted by [TT] or of 
a distinct shoulder in the decay of the order parameter. 



Over the last few years, it has become possible to simulate time dynamics of one dimensional quantum systems 
for very long times using matrix product methods such as the time-evolving bond decimation (TEBD) algorithm^!]. 
The computational effort required, in both time and memory, scales exponentially in the entanglement entropy of the 
system, and hence these methods work most efficiently when the quantum state being simulated has a small amount of 
entanglement. There are, fortunately, many important examples where the entanglement grows only logarithmically [5] 
and these techniques work well. Broadly speaking, slow entanglement growth seems to happen when we take a systm 
in its ground state, and then perturb it locally, by adding a particle or other perturbation at a single point. 

On the other hand, a global perturbation of the system, such as by starting the system in the ground state of one 
Hamiltonian and then evolving it under another, often gives rise to a /mear growth of entanglement entropy with time, 
making simulation exponentially difficult in time. This behavior has been studied from several directions, including 
numerically [3J, by conformal field theory techniques^], and by mathematical physics methods giving general upper 
bounds on the entanglement growth in time[5j|6]. 

One prototypical example of this behavior is the time dynamics of the spin- 1/2 XXZ spin chain. 



We start the system at time t = at the ground state of the Hamiltonian with infinite A, and then evolve the system 
for t > under the Hamiltonian with some finite A. This is a sudden quench from infinite A to finite A. The starting 
state at infinite A is given by 



This problem was studied in [3] where the linear entropy growth was found. A related problem of particles in an 
optical lattice was also studied with matrix product methods [7 . 

In this paper, we present a new approach to simulating such systems far from equilibrium. This approach combines 
the "light-cone" method introduced in [10 with matrix product techniques. The result enables us to simulate for 
significantly longer times than possible with any other existing method. 

The integrability of the XXZ Hamiltonian does not play any role in our approach. However, some of the physical 
results seen in our simulations may be a result of integrability, as discussed later. Integrability has been exploited to 
study the time dynamics of a BCS pairing model[8] after a sudden quench. The model studied in had no spatial 
structure to the interactions; instead, each fermionic mode interacted with each other fermionic model, which perhaps 
makes that model simpler to treat. However, even for that model it was necessary to use sophisticated numerical 
calculations to exploit integrability, so that for the XXZ Hamiltonian above it is not surprising that we must use 
numerics to understand the time dynamics. 

Our main physical interest in this system is to study the possibility of "revivals" in the order parameter as predicted 
by a mean-field study of the system in Interestingly, the mean-field of [11 is the same as the classical limit of the 
BCS pairing model of [5], although with very different initial conditions. If one measures the expectation value of 
on a given site as a function of time, (S'f (t)), one observes an oscillating behavior as a function of time, with damped 
oscillations (similar oscillating behavior is also observed for a related bosonic system in [7]). The expectation value 
also alternates sign as a function of site index i. By "revival", we mean that the envelope of this damped oscillating 
function may stop decreasing and instead increase for short periods of time; overall, the envelope decreases, but for 
short periods it may stop decreasing. 

It is not clear how applicable the mean-field theory of |TT] should be to the XXZ chain as the XXZ spin chain may 
be quite far from mean-field, being in one-dimension. Further, in this paper we consider the case A = 0.5, which is 
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a relatively strong interaction. However, we were interested to numerically test the dynamics of the order parameter 
at large times to see if some qualitative features of the mean-field carry over. Unfortunately, even at A = 0.5, the 
first revivals in the mean-field occur at a large time (> 20) which means that existing matrix product algorithms 
cannot reach the time to see this behavior. The light-cone method in this paper does reach the desired times. Due to 
sampling error in the Monte Carlo sampling, we are not able to make a definite statement that there are revivals based 
on our results, however the results strongly support the possibility of either revivals (an increase in the envelope) or 
at least a tendency for the envelope function to remain constant for periods of time rather than decreasing. 

The next section of the paper explains the basic idea of the light-cone, in a simpler setting of discrete time evolution. 
We show how to significantly reduce the computational cost involved in computing (on a classical computer) the 
expectation value of an operator after applying a quantum circuit to a state; the cost remains exponential, but with 
a lower exponential, albeit at the cost of some additional statistical sampling. After that, we present the matrix 
product method we used, a modified version of the infinite time-evolving bond decimation algorithm (iTEBD)[9 with 
improved numerical stability. This section is logically separate from the rest of the paper; on the one hand, one could 
use this modified iTEBD on its own, rather than as part of a light-cone simulation, while on the other hand the rest 
of the paper simply relies on using some matrix product algorithm to do the early time simulation and indeed other 
matrix product algorithms would work here. We chose this algorithm since it was best suited to our purposes with the 
least numerical effort. After that section, we describe how to combine the light-cone and quantum circuit methods, 
and in the section after that we describe our numerical results. All numerical work in this paper is done for the XXZ 
chain with A = 0.5. 



I. LIGHT CONE FOR QUANTUM CIRCUITS 

The algorithm in this paper is based on the idea of the "light-cone" . In relativistic systems, the importance of 
the light-cone is well-known. For such systems, no infiuence occurs outside the light-cone; equivalently, any two 
operators which are space-like separated commute with each other. However, even in a system described by non- 
relativistic quantum mechanics, such as a one-dimensional spin chain, there is still an upper limit to the speed at 
which any influence can propagate through the system. This bound is expressed formally through Lieb-Robinson 
bounds [T^ [T^ ITS] . Consider any operator O which acts on a single site, say site number 0. The Lieb-Robinson 
bounds can be used to show for many systems, including the XXZ spin chain, that exp{—iHt)0 exp{iHt) = 0{t) can 
be approximately described by an operator which has support only on a set of sites within distance vj^f/t of site 0, 
where vlr is called the Lieb-Robinson velocity. In contrast to relativistic systems, there is some "leakage" outside this 
light-cone; however, we can make this leakage exponentially small by slightly enlarging the support of the operator 
we use to approximate 0(t). 

In this section, we explain how the presence of a light-cone can be used to simplify the calculation of time- 
dependent expectation values. We explain this idea in a simpler setting, a quantum circuit model, in order to avoid 
the complexities of the Lieb-Robinson bound. Suppose we have N qubits on a line, labelled —N/2, —N/2+1, N/2—1, 
and we consider a discrete time dynamics as follows: on the first time step (and on all subsequent odd time steps), we 
act with a set of 2-qubit gates which act on qubits —N/2 and —N/2 + 1, qubits —N/2 + 2 and —N/2 + 3, and so on. 
Then, on the second time step (and on all subsequent even time steps) we act with 2-qubit gates on qubits —N/2 + 1 
and -N/2 + 2, qubits -N/2 + 2 and -N/2 + 3 and so on. 

We consider the following problem, which is a discrete-time analogue of the continuous time problem addressed 
elsewhere in this paper. We initialize the system to some given product state, evolve for T = N/2 time steps, and 
then measure the expectation value of the 0-component of the spin at site 0. 

How long does it take to compute this expectation value on a classical computer? The simplest algorithm is to 
store the amplitudes for the quantum state as a 2 ^-dimensional complex vector, and update this amplitude at each 
time step. The time required for a single time step for this algorithm is of order 2^ = 2^-^, and the total time is of 
order T2'^'^ . 

We now explain how to reduce this exponential from 2^^ to 2^ , at the cost of having to do some statistical sampling 
and of only approximating the expectation value. Consider Fig. 1, which shows a drawing of the gates in time. First, 
note that the gates outside the triangle have no effect on the final output, lying outside the discrete light-cone, and 
so they can be ignored. 

Before explaining how to actually do the calculation, let us motivate the approach physically, using the idea of 
entanglement. After time T/2, it is only necessary to consider the dynamics within region A marked in Fig. 1. Since 
this dynamics occurs only on the sites within distance T/2 = N/A of site 0, it occurs on a system of length T — N/2, 
and hence the cost to simulate pure state evolution on these sites is of order 2^. On the other hand, at early times 
(before time T/2) the system has less entanglement, and so also should be easier to simulate. The difficulty with 
implementing this physical idea is that at time T/2, the reduced density matrix on sites —T/2, ...,T/2 is not a pure 
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state as those sites are entangled with the region outside. Simulating the time dynamics of a mixed state is much 
more costly than simulating a pure state, taking 2^^ rather than 2"^. However, imagine that someone did a projective 
measurement of the system in some arbitrary basis of states on sites —N/2, —N/4 — 1 at time T/2, and someone 
else did a projective measurement on sites N/4 + 1, ...,N/2 — 1. The measurement has no effect on the expectation 
value of the spin on site 0. However, conditioned on the outcome of the measurement, the reduced density matrix in 
—T/2, ...,T/2 becomes a pure state. Thus, we can statistically sample different measurement outcomes and then do 
a simulation of pure state dynamics in —T/2,...,T/2. 

Now we explain the detailed approach. Let the initial state — 'i! l ® ^ r, where l,r are states on the left and 
right half of the system. Let Ua^ Ub,Uc,Ud be the unitary operators associated with the gates in regions A, B, C, D. 
Let H^ denote projection operators onto a complete basis of states, labelled by index a, on sites —N/2, —N/4 — 1 
and let H^ denote projection operators onto a complete basis of states, labelled by index /3, on sites N/4 + 1, ...,N/2. 
We wish to compute 

i^L ® '^bPIuIjUIuIs^UaUcUdUbI'^l ® ^-fl) (3) 

= {(Ub^l) ® {UD^R)\Ul,UlS^UAUc\iUB^L) ® (Ud'^r)) 

= Y^HUb'^l) ® {UD'^R.m!^Iiplu\S^UAUcIi^U^\{UB'^L) ® (Ud'^r)) 

a, 13 

= ^(([/S^'i) ® (UD'^R.m^nfliUB'^L) ® (C/D^-fi)) 

a, 13 

{{Ub'^l) ® {UD^R)\Il^Ilplu{S^UAUcIi^IlJ^\iUB'fL) ® (t/u^fl)) 

(((Jb^'l) ® (f/i)«'i?)|H^H||(f/B«'i) ® (t/zj^-fl)) 

where the second equality follows because the projection operators H^,H^ commute with the unitaries Ua, Uc\ i.e., 
because the projection operators are outside the light-cone. Interpreting 

{{Ub-^l) ® (C/D*i?) iH^Hf I (C/b^-l) ® {Ud-^r)) (4) 
as a statistical weight, we can compute the desired result by sampling 

{{Ub^l) ® (C/D«'j?)|H^H^C/^C/^5o'f^AC/cn^n||(C/B*L) ® {Ud^r)) 



with weig ht Q. Define ^a,i3 by 



(5) 



l*a,/3> = • (6) 

II^I11\{Ub^l)®{Ud^r)) 



Then Eq. ^ is equal to 



(*a,/3|^0l*c.,/3), (7) 

and (*L (g) ^r\uIu^jjUI;U\S^UaUcUdUb\'^l ® is equal to 



{^^Asmo.,p), (8) 

where (...) denotes an average over Monte Carlo steps with weightj4|. 

The light-cone algorithm for this quantum circuit is to sample (Isl with weight Q. All the calculations described 
here, such as calculating Ub'^l or Ud'^r, can be done in a time of order rexp(T), rather than rexp(2T), and 
hence the light-cone algorithm takes a time of order Texp(r) for each Monte Carlo sample. Since the operator S'q 
has bounded operator norm, the root-mean square fluctuations in the expectation value are bounded, and hence the 
sampling error decreases as one over the square-root of the number of samples. 

The basic idea of the light-cone quantum circuit method described in this section is the same idea used in [TD] where 
it was applied to continuous time dynamics. In the rest of this paper, we again treat continuous time dynamics, but 
in contrast to |10j we use a matrix product algorithm to perform the early time simulation. The use of the matrix 
product algorithm relies on the limited entanglement at early times (this contrasts with the approach in this section, 
where instead we factorized the early time dynamics into a product of two different unitaries, Ub and Ud, and relied 
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FIG. 1: Using the light-cone method for a quantum circuit. Triangle includes sites within the light-cone. Qubits are arranged 
along the horizontal axis, time increases vertically. We show the first round of the quantum circuit. Regions A,B,C,D are as 
described in the text. 



on the product nature of the initial state). One complexity in the matrix product simulation is that the probability 
distribution of outcomes a, /? will not factorize into a product of separate distributions for a and /3 because there may 
be some entanglement between those portions of the system. We will explain how to resolve this problem. Another 
difference in the calculations in the rest of the paper is that the early time evolution using the matrix product evolution 
will be done for a time which is slightly more than half the final time, rather than exactly half as it is here. 

II. MATRIX PRODUCT ALGORITHM 
A. Modified iTEBD 

To do the matrix product simulations, we used a modified version of the iTEBD algorithm with improved numerical 
stability. Before describing the improved algorithm, let us recall how iTEBD works. The algorithm stores a matrix 
pro(luc;t state for an infinite system. We have matrices r"*(s) and r^(s), for the even and odd sites respectively, 
where s labels the spin on the site. There are also diagonal matrices A"^ and defined on the bonds directly to the 
right of the even and odd sites. The wavefunction amplitude for a given configuration of spins, so, si, S2, ••• is equal 
to ■!/'(••••. SOj Si: ■S2, ....) = ...r"^(.so)A'''r'^(si)A"^r^''^(.S2)A'^... Thc matrices A^'"^ are chosen such that their diagonal 
coefficients coincide with the Schmidt coefficients of the state decomposed across the given bond; this contrasts with 
other matrix product state representations used in other algorithms where the algorithm only stores a single matrix 
for each site. 

The iTEBD matrix product state can represent a system with translational invariance of period 2. It is thus very 
convenient to use for our system, which has such invariance in its initial conditions. The time evolution is simulated 
using a Trotter-Suzuki method. The time evolution is broken up into small steps 5t, and the time evolution is 
approximated by a series of two-site unitary transformations, U^^ = and U^^ = (gj^C/'^''"^'^''!, where 

[/['%'■+!] ^ exp(— is a unitary acting on sites r,r + 1. By updating first with U^^ and then with U^^, 
the state maintains translational invariance at all times. In our implementation of the modified iTEBD algorithm 
described below we use a second-order Trotter-Suzuki decomposition with a time step of 0.0625. 

In a single update step, the iTEBD algorithm proceeds as follows. Consider the update under U^^ . The matrix 
jj[2r,2r+i\ j^g^ matrix elements U^^^''^^'^]^ , , where sa, sb represent the spins on the A and B sites after acting with 
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U and s^, s'g represent the spins on the A and B sites before acting with U . First, compute the matrix 

e(sA,ss) = A^C(sA,ss), (9) 

where we define the matrix C by 

c{s^^ss) = E <S.'.r-4(-:4)A-^r^(^:.)A^. (10) 

Thus, 

e(.^,..) = Y.^ uf:^l,j^T\^^x^v-{^^x^. (11) 

This matrix C is not part of the usual description of the iTEBD algorithm, but we define it for use later. Note that 



in Eq. (10) the quantity C^f^'^'g^^v' gi is a complex number, and all other quantities T^-^ and A"*'^ represent matrices 
which are multiplied following the usual rules of matrix multiplication. 

The matrix 0{sa, sb) has matrix elements 6{sa, SB)a,fj- Thus, is a four index tensor, with indices s^, Sb, ex., (3. In 
the second step, do a singular value decomposition of according to the index bipartition a, sa and P,sb, so that 

d{sA, Sb)q,/3 = E Xa,SA;l3^^l3y0-n,SB ■ (12) 
P 

The subscript notation a, s; (3 is intended to indicate that we treat X as a matrix with rows labelled by a and s and 
columns labelled by fi. 

In the third step, define the matrices T^^^ by their matrix elements T^'^{s)af) as follows: 



r^(s)o/3 - (A^„j (13) 

The matrices r"^,r^,A'^ replace the old matrices r"^,r^,A"* after this update step, while the matrix A'^ remains 
unchanged under the update step. Finally, we perform a truncation of the state: after each step, the bond dimension 
increases, and we truncate the state by keeping only the kmax largest singular values of 9, discarding the others. The 
quantity k^ax determines the maximum bond dimension, with increasing k^ax leading to an increased accuracy in 
return for additional numerical effort. The step with the largest numerical cost is the singular value decomposition, 
which requires a computational time scaHng as 0{k^g^^). 

Unfortunately, we found some difficulties with numerical stability in our implementation of this algorithm for larger 
(> 1000) values of k^ax- The matrices A'^ will contain some very small diagonal entries, and therefore (A'^)^^ will be 
very large. Therefore, any small error in the singular value decomposition will tend to get magnified when multiplying 
by (A^)-i. Note the sequence of steps: first we multiply by A'^, then we do a singular value decomposition, and then 
we multiply by (A'^)~^, so that problems can arise if the singular value decomposition in between the multiplication 
steps is not numerically exact. 

The solution to this problem is simple. First, we changed which matrices are stored by the algorithm, so that 
instead of storing the matrices r^,r^, we store the matrices A^,A^ defined by 

A^(s) =r^(s)A^, (14) 
A^{s)eeT^{s)X^. 

These matrices A-^, A^ are the usual A matrices defining a matrix product state. Such a relation A{s) — r(s)A was 
used previously in [TB], but [TB] used a method of doing the update which still required a division by A while the point 
of the method described below is that it does not require any division (other minor differences are that 16J was not in 
the context of infinite systems and also in [16j reduced density matrices were diagonalized while we employ singular 
value decomposition instead). 

We continue to store the matrices A"^, A^. Then, in an update step, we first compute 

dsA.SB) ^ E U'^:Z:^l,A^{s'A)A^is'A)- (15) 
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Note that if we substitute Eq. ( 14 1 int o p5| we find the same result for C as ( 10 ) . We then compute the matrix 
Q{sa-,sb) from C{sa,sb) following E q. (|10[ ). We next do a singular value decomposition of 9 as in Eq. ( 12 1, obtaining 
matrices X, y, A'^. In analogy to Eq. (^14|7we now wish to compute the updated matrices A which are defined by: 



(16) 



However, in order to compute the updated matrices A^, A^ , we do not do this by computing f ^, f ^ and applying 
(16 1. Instead, we compute them in a different way. To derive the approach that we use, note that Eq. (13 1 implies 



that Eq. ( 16 ) is equivalent to 



and 



(17) 



(18) 



7, SB 

^ C{sA,SB)a'i{Y^^'j"f,SB;l3, 



7. SB 



where the matrix Y^^ has the opposite index bipartition to matrix Y (that is, ^,sb', (i instead of /3; 7, sb)- 

Note that Eq. (17 1 does not involve multiplying by A~^. Similarly, the last line of Eq. (18 1 gives an expression 



for A'^ that docs not involve multiplying by A~ , so this method of computing A docs not have the same problem of 
numerical stability. Further, no extra CPU time is required to compute C on the last line of (18 1 since the algorithm 
computes this already as part of computing the matrix 6. Finally, since Y is unitary, we have Y^^ — Y^ so there is 
no overhead to compute the inverse of Y . That is, Eq. ([T8| implies that 

(19) 



1,se 



Thus, we can use Eq. dlTpt 

to improve the stability of the algorithm. The only extra overhead required is one 
extra matrix multiplication, to multiply by 1"^ as in (19 1. The overhead to do this is small compared to the time 



required to do the singular value decomposition of 9. The complete pseudo-code for a single update step is 



1. Compute C by Eq. (15 1 and then compute Q by Eq. n9[ 



2. Do a singular value decomposition as in Eq. ( 12 ) 

3. Update by Eqs. (|l7|l9l. 



B. Incorporating Symmetry 

Both the iTEBD algorithm and the modified iTEBD algorithm discussed here make it very easy to incorporate 
symmetry such as conservation of total as follows. For a Hamiltonian such as the XXZ Hamiltonian ([l]) and for 
initial conditions which have definite on each site as we have considered, it is always possible to choose the basis 
vectors of the Schmidt decomposition to have definite . Thus, for each bond variable a, we can assign a definite 
S^, giving the total of the spins to the left site of the given bond. The infinite size of the system does not give any 
trouble with assigning a definite S^, as we simply define the of the system to the left of a given bond to be the 
difference between the total spin to the left and the initial spin to the left. This difference can be easily kept track of 
because it increases or decreases by one when a single spin up moves across the bond to the left or right. 

After doing the singular value decomposition for each 5^ , we get a list of Schmidt coefficients for each . If there 
are more than k^ax different Schmidt coefficients, we truncate. To do the truncation, we merge these lists, sort the 
merged list from largest to smallest, find the kj^ax largest Schmidt coefficients, and keep those. This means that the 
number of Schmidt coefficients we keep for each will depend on ; wc find that we keep many Schmidt coefficients 
for near zero, and fewer for S*^ far from zero. 
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Some fluctuation is observed in how many Schmidt coeflicients were kept for each S*^, but a typical set of values 
for k^ax = 4096 was 2 at S*^ = -5; 28 at S*^ = -4; 142 at S"^ = -3; 403 at S'^ = -2; 748 at = -1; 968 at 5^ = 0; 
890 at 5^ = +1; 576 at S*^ = +2; 254 at S"^ = +3; 71 at S"^ = +4; 12 at = +5; and 2 at S*^ = +6. Note the 
asymmetry between positive and negative S^, since the alternating spins in the initial configuration, combined with 
our definition of as the total spin to the left of a given bond, breaks the symmetry. To explain this asymmetry 
further, consider a site which in the initial configuration is spin up, with its neighbor spin down; the only way that 
the spin to the left of this bond can change at the first instant of time is for the up spin on the given site to hop to 
the left, increasing the total spin to the left, but there is no corresponding process which would decrease the spin to 
the left in the first instant of time. 

The most computationally costly step is singular value decomposition. The bond variable (3 in Eq. (12) has an 
which is equal to the of bond variable a plus the z-spin of sa- Thus since sa takes two different values, there 
are two different values of bond variables a which contribute to a given value for bond variable /3. Thus, for 
the specific number of Schmidt coeflicients kept which is given in the above paragraph, the largest matrix that we 
decompose has bond dimension 968 + 890 — 1858. Without the use of symmetry, keeping 4096 bond coeflicients would 
require decomposing matrices of size 8192. 



III. COMBINING LIGHT-CONE AND MATRIX PRODUCT 



We now describe how we combine the matrix product algorithm with the light-cone method. The matrix product 
simulation will be accurate for a certain range of times; the larger k^ax is the longer it works. We run the algorithm 
for as long a time as possible for the given kmax- Let this time be tinu- For k^ax — 4096, we could take tinn slightly 
larger than 16 before encountering appreciable truncation error. The kmax required to reach a given tinu at a given 
truncation error increased exponentially with tinu- After simulating to time imit, we save the matrices deflning the 
matrix product state. A separate program then performs the second phase of the simulation as follows. 

We wish to compute the expectation value of the z-component of the spin on a site, say site 0, at a time i/i„ > tinu- 
That is, {S§{tfin))- The matrix product simulation gives a matrix product state |V'™^*(imit)) which is a good 
approximation to the state \tp{tinit)), where 

iV'(imit)) = exp{-iHtimt)\i') ■ (20) 

Then 



{So{tf^n)) - (V'"P^(i™t)|exp[lif(t/„ - Un^t)]S^exp[-iH{tf,n - Un^t)M"'''%kn^t))■ 



(21) 



Even in this continuous time setting, the situation is similar to that in Fig. 1: the dynamics of the system outside this 
light-cone has little effect on the spin. The Lieb- Robinson bounds jT2l [13l [HI [15] make this intuition precise; using 
these bounds we can prove that we can approximate (21 1 by 



(22) 



where H includes only the interaction of sites within some distance / of site i. Thus, 



H 



loc 



\ ^ QX nx 



i=-l 



(23) 



As long as I > VLR{tfin — tinit), then it is possible to use the Lieb- Robinson bounds to prove that the approximation 



(22 1 is exponentially good, where v^ji is the Lieb-Robinson velocity. Previous experience with other light-cone 
techniques tells us that in fact the approximation is good so long as I > Vsw{tfin — Unit) where Vgw is the spin- wave 
velocity, given by 

Vs^^i7T/2)sm{d)/9, (24) 

where cos(6') = A[17j. Note that the dynamics we consider takes place in a highly excited state, rather than in the 
ground state, so a priori it is not obvious that Vsw is the correct velocity for excitations, but numerical evidence both 
previously and in this work indicates that it still is the correct velocity in this regime. Note also that Vsw < vlr for 
this system. 

Let C denote the set of sites to the left of site —I and TZ denote the set of sites to the right of site +1, as shown 
in Fig. 2. Imagine that at time tinit someone measured that state of the system on L, doing the measurement in 
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FIG. 2: Sketch of regions £,7^ for I = 2. 



the Schmidt basis, and also measured the state of the system on i?, again in the Schmidt basis. Physically, this 
measurement, being outside the hght-cone, should not affect the measurement of the spin on site at time i/i„. 
Mathematically, we can express this as 

Q,/3 

where II^ projects onto state a in L and 11^ projects onto state /3 in i?, with 11^ = 1. Note that the correctness 
of (25l depends on the fact that 11^ and 11^ commute with exp[i_ff''"^(i/i„ — fi„it)]S'Q exp[— i_ff'°^(i/i„ — tinit)\- 



Note that 



^(V''"^'(i,mt)|n^n^exp[ii7'°^(t/„ - t„„t)]5o' exp[-zff'°^(i;„, - U^,ml^'^\i'^''\Umt)) (26) 



Q,/3 

= ^(V™f^(i,„,,)|n^n^|v™^'^(i,™t)) X 

Ea,;3(V''"^'(W)|n^nf exp[^H'°^(t/,„ - t,„,t)]^gexp[-ig'°^(t/,„ - ^.„.t)]n^n^|V;'"p^(f.n^t)) 

Ea,/3(^"^^(*™t)|n^n«|V''"j'^(t„,i)> 

Thus, if we statistically sample with the weight 

(^"^^(t™t)|n^n^|V"^^(t™t)) 

we find that 

(V''"^^(i^n,t)|exp[zi?'°=(i;„ - t„,0]^0 exp[-zi/'°^(t/„ - U^^t)W''\t^u^t)) 

_ (^"'P"(t^n^t)|^^^gexp[^g'°^(t/,„ - t,„,t )] exp[-ig'°^(t/,„ - t,„,t)]nj;nf |7/;'"P«(t„,t)) 

~ Ea,,3(V'"^'^(^™t)|n^nf|7^-f^(t,™t)) 

= ^^P[*-^'°''(^/"' ^ ^mit)]^^ exp[-jff'°=(t/i„ - tinit)\\iC^^ ^tinit)) 



(27) 

(28) 



where (...) denotes the average over Monte Carlo steps with weight (27 1, and 

n^nf|7/."p^(t,„,t)) 
n^nf|^"p«(t,„,0> 



|V'")^''(t»mt)) 



(29) 



The light-cone algorithm does the statistical sampling of ( 28 1 with weight ( 27 1 . For each Monte Carlo step, the 



algorithm does three things. (1): Pick a,/3 according to the probabihty distribution (27 1; (2): Calculate the state 
|'0™^*(imit)) ill the computational basis (a product basis in which each site has definite spin up or down); (3): Evolve 
the state (tinit)) forward in time for a time iyi„ — ti^it and compute the expectation value of 5*0. We explain 

how to do each of these steps in turn. 
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A. Sampling q, /3 



To sample a, (3 according to ( |27[ ), we proceed in a series of steps. The difficulty that we encounter is that a and (3 
are correlated: the probability distribution in (27 1 does not factorize. We will show how to overcome this by replacing 
this probability distribution by a product of conditional probability distributions which are easier to compute, and 
then sampling each of those probability distributions. Finally, after explaining our approach, we will explain why 
an alternate approach, which at first sight seems more natural, is not good because it requires much more computer 
time. 



Note that 



where 



and 



Note that 



\a) 



E 



(30) 
(31) 

(32) 
(33) 



so that P{[3\a) can indeed be interpreted as a conditional probability distribution. Therefore, we can first pick a 
randomly according to the probability distribution P(a) and then choose /3 randomly according to the probability 
distribution P{(3\a). Choosing a is easy, since P{a) is given by the square of the corresponding Schmidt coefficient, 
and these Schmidt coefficients are the diagonal entries of A. Choosing j3 is more difficult, since a and fi are correlated. 

To choose we proceed as follows. Define HI to project onto the state of site i with spin up, and define 11* to 
project onto the state with spin down. Then, we have 



P(/3|a)= P{s-i\a)P{P\s^ia). 



where 



and 



P{s^i\a) 



\s-ia) 



(V'"P^(^^mt)I^^IV'"P^(^^n^^)) ' 

(V''"^^(t,„.t)|n|nj'n^|V;'"p«(t,„,0) 



(34) 



(35) 



(36) 



Note that P{] \q) + P{[ \a) = 1 so that we can interpret P{s\q) as a conditional probability. 
Repeating this, we find that 



S-l + l S-i 



where 



P{S^\S^ 



■ s-io) 



( V'"^^ {Umt Ws, n'-_\ . . .n^^ I V;'"^- {u^,t ) ) 



(38) 



Thus, we can first sample a randomly as before, and then randomly sample the spin on site —I conditioned on that 
a, randomly sample the spin on site — ^ + 1 conditioned on the given a and s_;, and so on, until all spins are sampled, 
and then finally sample (3 according to 



P{l3\si...s-ia) = 



(^"^^ {Ur.^t) ingnj, . . .n^i I y^^p- (t,„,, ) ) 



(39) 
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Computing each of these conditional probabihties can be done in a time 0{kfj^g^^) as follows. After j steps, we store 
the vector A"^'-^(s_/+j_i)...A-^'^(s_;)n^|-0'"P^(timt)), where the A matrices in this expression are alternately chosen 
to be or A^ depending on whether the site is on the odd or even sublattice. We then compute the two different 
vectors A^'^ {s-i+j)A^'^ {s-i+j-i)...A^'^ {s-i)Il^\ip"^'^'^ (tinit)) for the two different choices of S-i+j, and compute 
the norms of these vectors to determine the conditional probability. After choosing S-i+j, we keep the corresponding 
vector and use it in the next step. The matrix- vector multiplication takes a time 0{k^^^). 

This solves the problem of statistically sampling a, (3. Finally, we would like to discuss why a different, seemingly 
more natural approach to the problem is not as good as the approach explained here. This different approach would 
be first statistically sample a as discussed here. Then, initialize a matrix p to the state |q:)(q:|. The matrix product 
state defines a CP map. In fact, it defines two different CP maps, one for odd sites and one for even sites, which we 
can call £;ei;en £odd Propagate the state p through these two CP maps, alternately applying the even and odd CP 
maps, applying a total of 2/ + 1 maps. If I is even, the result is £'^^'^"{£°''''^ ...{£'^'"^"{p)...), while if I is odd then instead 
we apply £°'^'^{£^'"^'^ ...{£°'^'^{p)...). The result gives the reduced density matrix on sites — oo, +/, given that on sites 
— oo, —I the system is in state a. The diagonal entries of this density matrix give the probabihty distribution of /3. 
However, this approach, while more natural, requires performing matrix-matrix multiplications, and hence would be 
much slower than the matrix-vector multiplications that we used above. 

It is important to understand that after sampling a, (3, we do not make any further use of the randomly sampled 
values of the spins S-i, ...,si that we found in this step. We discard those values S-i, ...,si and keep just the a, /? for 
the next step. Different spin configurations associated with a given a, /3 will be added coherently, as seen in the next 
step where we compute the amplitude to be in each spin configuration for the given a, (3. 



B. Computing 

The next step is to compute the amplitudes for ^"^^ in the computational basis. There are 2^'+^ different amplitudes 

that we need to compute. We first explain a simple (and slow) way of computing these amplitudes, and then describe 
a much faster "meet-in-the-middle" way to compute these amplitudes which is what we actually used. 

The simple way is as follows. Suppose I is even (if I is odd, then the sublattice indices A, B will be reversed in this 
paragraph and the next). We compute the amplitude to be in each of the 2^'+-'^ different states in turn for the state 
{tinit)) , and then we normalize the state after. To compute the unnormalized amplitude to be in the state 
si,si-i, ...,S-i, first compute the inner product {a\A^{s-i)...A^{si-i)A^{si)\(3), where {a\ is the vector with a 1 in 
the a-th entry, and zeroes elsewhere. Computing the inner product takes a time {21 + l)0{k^^^^). Thus, the total 
time is 

(2/+l)22'+iO(fcLj. (40) 

Note that since we compute unnormalized amplitudes, it does not matter that the inner prod- 
uct {a\A'^{s^[)...A^{.si^i)A'^{si)\f3) has an extra factor of A"* at the end compared to the expression 
(a|r'^(s_()A'^...A'^^-^(.s;_i)A-^r^(s()|/3). We will normalize the amplitudes later. 

A much faster way is a meet-in-the-middle approach. For each of the 2'+^ different configurations of the spins 
S-i, s_i, So, we compute the vector {a\A'^{s-i)...A^(so)- Note that this vector is in the auxiliary Hilbert space of 
dimension given by the bond dimension. Also, for each of the 2' configurations of the spins s;, s;_i, Si, we compute 
the vector A^{si)...A^{si)\P). Computing all of these vectors takes a time 

(/+l)2'+iO(A:Lj. (41) 

We save all of these vectors (the memory requirement for this is 0{kmax'2^^^) which means that as long as 2'+^ < kmax 
the additional memory required is negligible). Then, for each of the 2^'+^ spin configurations of spins s;,...,s_; we 
compute the inner product of the vectors {a\A'^{s-i)...A^{so) and A^{si)...A^{si\f}). This takes a time 

2^'+'0{kma.), (42) 

and hence the total time required is 

{I + l)2'+iO(fcLj + 2^'+iO(fc„„.). (43) 

Again it is possible to make use of symmetry in these calculations. We compute the vector A^ {si)...A-^{si)\l3) 
in I different multiplications, and after each multiplication the vector has a definite S^. This greatly speeds up the 
matrix-vector multiplications. Also, the resulting vector ^^^^ has a non-vanishing amplitude only in a single 
sector, which reduces the memory requirement and speeds up the calculation of the time evolution described in the 
next subsection. 
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C. Time Evolution of 

Finally, we time evolve the state for time tjin — tinu- We do this evolution using sparse matrix-vector 

multiplication as follows. Before doing any Monte Carlo sampling of the states, wc build a representation of the 
Hamiltonian as a sparse matrix. Each row of the matrix has 0{l) non- vanishing elements. 

Then, we fix a small time step 5t and evolve the vector over several of these small time steps. In the simulations 
here, wc choose (5t = 1/3. In this way, a single Monte Carlo sample allows us to compute the expectation value of Sq 
at several different times. To evolve for a time step 5t we approximate 

^raa x TTn 

eMiHStm « V (-i)" 1^-), (44) 

where we truncate the Taylor series expansion at a finite order Umax- The vector iJ"|\E') can be computed using 
sparse matrix- vector multiplication. The total computational time required is 

0(n™a.(Z+l)22'+^). (45) 

In practice, this step is very fast compared to the previous step of computing for the values of kmax and I that 
we choose. 



D. Pseudo-Code 

We recap this description of the procedure by giving the pseudo-code for a single Monte Carlo step. 

1. Choose a randomly according to the probability distribution |A^(a)p. 

2. For i = -I to i ^ +1 do 

2a. Use the stored vector A^'^ {si-i)...A'^'^ {s-i)n^\^"'P''{tinit)) to compute 

A^'^{si)A^'''{si-i)...A^'''{s-i)n^\^"'P%Unit)) for both choices of S-i+j. 
2b. Randomly choose S-i+j according to P{si\si-iSi-2---S-ia). 

3. Randomly sample /? according to P{(3\si...S-ia). 

4. For each of the 2'+^ configurations of spins —l,...,0 compute the vector {a\A^{s-i)...A'^{so)- Save for use in 
step 6. 

5. For each of the 2' configurations of the spins s;, s;_i, si, compute the vector A^ (si) ...A^ {si)\(3) . Save for use 
in step 6. 

6. For each of the 2-^'+^ spin configurations of spins si,...,S-i compute the inner product of the vectors 
{a\A^{s^i)...A^{so) and A^ {si)...A'^{si)\/3). Save the result as the amplitudes of the state ^^^^ in the com- 
putational basis. 

7. NormaUze that state 

8. Evolve state forward by a sequence of time steps St until time tfin- Compute spin expectation on site 
after each time step. 

E. Time Estimates 

The total time required for all steps is 

0(/2'fcLx + I{tfin/St)nmax2''^)- (46) 

The dependence on tfin and Umax is unimportant compared to the exponential dependence on /, since it suffices to 
take Umax = 0{l) to obtain accurate results. Looking only at exponential dependence on I and polynomial dependence 
on kmax, and ignoring any polynomial dependence on /, we find 



(47) 
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This is to be compared with a time ©(fc^^^) for the matrix product simulation (again, ignoring the dependence on 
the number of Trotter-Suzuki states, and only considering the dependence on kmax)- Thus, equating the two times, 
we find that we can take 

2' - k^ax- (48) 

This in fact is roughly the regime in which we worked below, since we took I — 10 and kmax — 4096. Similarly, since 
we work in the regime 2'+-'^ < k^axi our additional memory requirements are negligible. 

In fact, our time to do a single Monte Carlo step was much faster than our time to do the matrix product 
simulation. This was balanced by the fact that we have to do many Monte Carlo steps. Fortunately, the Monte Carlo 
steps parallelize trivially. 

In principle one could avoid any Monte Carlo sampling by summing over all possible values of a, (3, appropriately 
weighted. While this calculation parallelizes well, it would not be practical for large values of kmax- 



IV. RESULTS 



We simulated using the modified iTEBD for k„iax ranging up to 4096. Accurate results were found up to a time 
slightly greater than 16, with low truncation error. Unfortunately, we do not know a good way to directly compute 
truncation error in iTEBD. For a finite size system, one can compute the difference between the state after truncation 
and the state before truncation. This gives an upper bound on how the truncation affects expectation values. For an 
infinite system, this doesn't work. To see why this doesn't work, consider a finite but large system: even if we make 
only a small error in the state everywhere and are able to accurately approximate all the local expectation values, we 
make a large total error in the state vector because the system is large. So, we estimated truncation error in a few 
different ways. We checked how certain observables, such as 5*0, depended on kmax- For a given kmax < 4096, the 
curve would follow the kmax = 4096 curve for some time, and then deviate. We could see how the time of deviation 
depended on k^axi and extrapolate to k^ax — 4096. 

Also, we took results from the iTEBD simulation at times 15.0625 and 16.0625 and started the Monte Carlo 
sampling at those two different times and compared results. When we compared two different Monte Carlo averages, 
one with tinu — 16.0625 and one with tinu = 15.0625, both having the same I, we observed that the results agreed, 
up to statistical error, until a time tfin ~ 15.0625 + l/vgw- That is, the deviation between the two curves only arose 
because the curve started at tmit = 15.0625 saw the finite-size effect of a finite I earlier than did the curve started 
at Unit — 15.0625. Thus, we are fairly confident about the accuracy of our modified iTEBD simulation up to time 
16.0625 up to times of roughly 22.5 as shown in the figures below. 

It took roughly 2 CPU days on a 2Ghz Opteron to do 1000 Monte Carlo samples for I — 10. We did 98000 samples 
for tinit = 15.0625 and 82000 for tinu — 16.0625. We did several additional runs for smaller / and smaller tmit to check 
the algorithm. In Fig. 3 we show the computed behavior of the magnetization {{SQ(t))) as a function of time using 
the modified iTEBD with kmax = 1024,2048,4096. For any given kmax, the accuracy of the algorithm breaks down 
badly at sufficiently long time. Doubling kmax leads to only a constant increase in the time which can be simulated, 
which implies that reaching times > 20 will require prohibitively large matrices. 

In Fig. 4 we show a comparison of the light-cone method and iTEBD with kmax = 4096. The difference between 
the two different light-cone curves is within sampling error for the given number of samples. The root-mean square 
sampling error was roughly 0.00025. By increasing the number of samples this error can be reduced as the square-root 
of the number of samples. 

In Fig. 5 we show the mean-field of [11) . This curve shows an increase in the order parameter at times > 20, 
followed by a decrease at later times. The curve is well-described by beating together two different cosines, with a 
l/i"^/^ envelope. The revivals come from the beats. 

In the black line in Fig. 6 we show the time series of peak heights for each maximum in the absolute value of 
{SQ{t)). Each circle represents a given time at which a maximum occurred. Circles at early times are from iTEBD, 
circles at later times are from light-cone. Note the non-monotonic behavior. The y-axis is a logarithmic scale. The 
non-monotonic behavior is within sampling error, so it is possible that the peaks do decay monotonically. However, 
we are fairly confident that even if the peaks do decay monotonically there is a distinct feature in the curve showing 
a flattening of the decay at later times. 

It is possible to reduce the sampling error further by more runs. A ten- fold increase in runs would enable more 
definite statements about the monotonicity of the peak heights. Interestingly, the root-mean-square fluctuation in the 
mean of a given light-cone curve (such as the curve with Unit = 16.0625) is much greater than the root-mean-square 
fluctuation in the difference of the y-coordinate of the curve at two different times. So, it may be possible to reduce 
sampling error by shifting the y-axis by a constant to agree with the known correct result at early times. We have 
tested this, as shown in the red line in Fig. 6 and the result indicates that the decay of peak heights almost stops at 
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Time 



FIG. 3: Comparison of iTEBD for k^a:, = 1024, 2048, 4096. 




Time 



FIG. 4: Comparison of light-cone and iTEBD with kmax = 4096. 



around time t — 20, although in this case the peak heights no longer increase at time 21.5. More runs will be required 
to be certain whether the peak height is actually non-montonic or whether there is simply a sharp shoulder in the 
graph of peak heights. 
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Time 



FIG. 5: Mean-field of [TT]- 



V. DISCUSSIONS 

We have developed a method combining the hght-cone technique with matrix product methods. While the compu- 
tational complexity is still exponentially large as a function of time, the exponent is lower than using matrix product 
methods alone. As a result, we are able to go to significantly longer times for the same computational effort. We have 
discussed several different ways of doing the sampling depending on the computational resources of time and memory. 

While we used a cluster to perform statistical averaging, the light-cone method makes it possible to perform 
simulations for shorter times with very small computational cost. If we reduce Unit, then the required kmax reduces 
also, and then the statistical sampling becomes faster (the dominant cost in the statistical sampling is the C'(2'fc^„^^) 
cost). For example, with kmax = 128, it is possible to reach tinit ~ H; in that case if we take the same I = 10, we 
could reach a time of roughly 18. With k^ax — 128, the statistical sampling would of course be much faster than 
with kmax = 4096. 

Our results support the possibility of either revivals in the order parameter or of a flattening out of the decay of 
peak height. Perhaps at smaller A the system would be closer to mean-field and so the revivals would be more clear; 
however, at the same time, the time before revivals occur is longer in this case which makes the simulations more 
difficult. 

The integrability of the system does not play any role in our method. However, it does play some role in the results. 
Even in the mean-field studied in [11], breaking integrability leads to a large change in the asymptotic behavior of 
the magnetization; the mean-field for the integrable system has a power law decay, while non-integrable perturbations 
lead to a much faster exponential decay and can destroy the revivals if they are strong enough. 
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FIG. 6: Peak heights. Error bars represent samphng error. Black hne is raw data. Red hne is with constant factor added to 
y-value of sampled data to correct for greater variance in the mean as described in text. 
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